Annotation Time Stamps - Temporal Metadata from the Linguistic Annotation Process

نویسندگان

Katrin Tomanek

Udo Hahn

چکیده

Abstract We describe the re-annotation of selected types of named entities (persons, organizations, locations) from the MUC7 corpus. The focus of this annotation initiative is on recording the time needed for the linguistic process of named entity annotation. Annotation times are measured on two basic annotation units – sentences vs. complex noun phrases. We gathered evidence that decision times are nonuniformly distributed over the annotation units, while they do not substantially deviate among annotators. This data seems to support the hypothesis that annotation times very much depend on the inherent ‘hardness’ of each single annotation decision. We further show how such time-stamped information can be used for empirically grounded studies of selective sampling techniques, such as Active Learning. We directly compare Active Learning costs on the basis of token-based vs. time-based measurements. The data reveals that Active Learning keeps its competitive advantage over random sampling in both scenarios though the difference is less marked for the time metric than for the token metric.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies

A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...

متن کامل

Discourse-constrained Temporal Annotation

We describe an experiment on a temporal ordering task in this paper. We show that by selecting event pairs based on discourse structure and by modifying the pre-existent temporal classification scheme to fit the data better, we significantly improve inter-annotator agreement, as well as broaden the coverage of the task. We also present analysis of the current temporal classification scheme and ...

متن کامل

Timed Annotations - Enhancing MUC7 Metadata by the Time It Takes to Annotate Named Entities

We report on the re-annotation of selected types of named entities from the MUC7 corpus where our focus lies on recording the time it takes to annotate these entities given two basic annotation units – sentences vs. complex noun phrases. Such information may be helpful to lay the empirical foundations for the development of cost measures for annotation processes based on the investment in time ...

متن کامل

Tags Re-ranking Using Multi-level Features in Automatic Image Annotation

Automatic image annotation is a process in which computer systems automatically assign the textual tags related with visual content to a query image. In most cases, inappropriate tags generated by the users as well as the images without any tags among the challenges available in this field have a negative effect on the query's result. In this paper, a new method is presented for automatic image...

متن کامل

Getting at the Cognitive Complexity of Linguistic Metadata Annotation – A Pilot Study Using Eye-Tracking

We report on an experiment where the decision behavior of annotators issuing linguistic metadata is observed with an eyetracking device. As experimental conditions we consider the role of textual context and linguistic complexity classes. Still preliminary in nature, our data suggests that semantic complexity is much harder to deal with than syntactic one, and that full-scale textual context is...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

Annotation Time Stamps - Temporal Metadata from the Linguistic Annotation Process

نویسندگان

چکیده

منابع مشابه

An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies

Discourse-constrained Temporal Annotation

Timed Annotations - Enhancing MUC7 Metadata by the Time It Takes to Annotate Named Entities

Tags Re-ranking Using Multi-level Features in Automatic Image Annotation

Getting at the Cognitive Complexity of Linguistic Metadata Annotation – A Pilot Study Using Eye-Tracking

عنوان ژورنال:

اشتراک گذاری